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We discuss the impact of various improvements on simulations of dynamical overlap fermions 
using the Hybrid Monte Carlo algorithm. We focus on the usage of fat links and multiple pseudo 
fermion fields. 
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As lattice simulations of QCD push to ever smaller quark masses, they increasingly benefit from discretizations which 
respect chiral symmetry. Simulations with Nf flavors of degenerate overlap IlluJ fermions encode an exact SU(Nf) (g> 
SU(Nf) chiral symmetry, and are the cleanest theoretical description of lattice fermions known to date. They have 
several theoretical advantages as compared to the traditional formulations using Wilson or staggered quarks. Their 
correct implementation of chiral symmetry protects them from exceptional configurations which plague simulations 
with Wilson fermions. Staggered fermions have a U (l)<g)[/(l) chiral symmetry, and are probably the easiest fermions to 
q , simulate, but one staggered fermion flavor corresponds to four continuum "tastes." It is unknown whether the standard 
trick of performing staggered simulations by taking the quarter root of the fermion determinant is justifiable. Staggered 
fermions also have a complicated low energy effective field theory due to taste-breaking interactions, while the low 
energy dynamics of overlap fermions is basically identical to that of the continuum. Domain wall fermions involve 
simulations on a five-dimensional background. Flavor symmetry is unbroken, and chiral symmetry becomes exact 
I , as the length of the fifth dimension goes to infinity. Recent large scale simulations with domain wall fermions claim 
' to have good chiral behavior [3|. However, a paperfj] which appeared nearly simultaneously, describing requirements 
I/"*) , for good chiral behavior for matrix elements appropriate to CP violation in kaons, set somewhat more stringent 
requirements than the simulation claimed. We prefer to avoid these questions by working from the beginning with 
| exact chiral symmetry for dynamical fermions. 

Unfortunately, the much higher cost of applying the Dirac operator on a vector prevents us from using overlap 
fermions in large scale simulations. Until now only exploratory studies have been published 0,00- This paper is 
also only exploratory. Its goal is to investigate several methods to improve the performance of the Hybrid Monte 
Carlo (HMC) algorithm |8( and the extension proposed in ||j for these fermions. The major new ingredient is the use 
of a differentiable fat link, the "stout link" of Peardon and Morningstar0, as the gauge connection for the fermions. 
"J" 1 1 Fat links are well known to reduce the computational cost of overlap fermions [l£J, El ■ Their bottleneck has been that 
Oh, the existing forms could not be used in the standard dynamical fermion updating algorithm, Hybrid Monte Carlo. 
Stout links overcome that difficulty and give a speedup of about an order of magnitude in computer time compared 
to simulations of overlap fermions with thin links. 

The eigenvalues of the overlap Dirac operator lie on a circle in the complex plane. The overlap operator maps the 
eigenvalues of a "kernel" action onto this circle. The eigenmodes of these free kernel actions typically lie on arcs, 
and the eigenvalues of the kernel in an arbitrary background gauge field can be dense everywhere. Fat links tend to 
force the eigenvalues back onto the free-field arcs, reducing the density of eigenmodes near the center of the overlap 
circle. Golterman and Shamir [T^| have recently suggested that chiral symmetry for overlap action could be realized 
differently if the eigenmodes of the kernel whose eigenvalues lie near the center of the circle become delocalized in 
space. By reducing the density of eigenmodes, we reduce the probability that nearly degenerate modes exist and can 
mix. We will show that fat link kernel eigenmodes are more localized than thin link kernel eigenmodes. 

The major difference between dynamical simulations of overlap fermions and other kinds is the discontinuity in 
the overlap fermionic action associated with changing the topological sector. If carefully treated, away from these 
topological boundaries, the overlap fermion force in Hybrid Monte Carlo is never singular, and so a larger integration 
step size can be tolerated. For us, the force due to the gauge part of the action is actually larger than the fermion force. 
This part of the force is inexpensive to compute, and it can be easily smoothed by using the Sexton- Weingarten|l3j 
multi-scale time-step update. 

At a topological boundary, the step in the fermionic action can be treated by the algorithm proposed in [6| which 
reflects or refracts the momentum at the discontinuity in the fermionic action in analogy to classical mechanics. 
Pseudo-fermions give only a noisy estimator of the fermion action. The noise suppresses topolog ical changes. This 
problem can be partially ameliorated with the multi-pseudofermion method of Hasenbusch 14, 15]. 

The use of improved actions is accompanied by the necessity of choosing values for a large parameter set - nearly 
all of which are related to coefficients of (formally) irrelevant operators. This is only psychologically different from 
the use of more standard actions, where essentially all these parameters are set to zero value and ignored. Generally, 
there is a wide latitude in the choice of these parameters. As we proceed, we will discuss our choices (summarizing 
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everything in the conclusion), with a description of what will happen when the parameters are varied, when it is 
available. We cannot imagine that anyone would want to do simulations with precisely our choice of parameters, but 
we believe that any reasonably similar action would produce similar results. 



II. ACTION AND ALGORITHM 



A. Definitions 



To fix the conventions, let us recall the definition of the overlap Dirac operator D ov (m q ). It is constructed from 
some lattice Dirac operator (in the following called the kernel operator) d(fi) and its associated Hcrmitian Dirac 
operator /i(/i) = 7s<i(/i) with mass term /x. The massless Hcrmitian overlap operator H(0) — 75-D O v(0) is defined as 

H(0) = Ro[j 5 +e(h(-Ro))] (1) 
with Rq > the radius of the Ginsparg- Wilson circle. The matrix sign function e(h) can be defined by 

e(/0 = 4f = ^sign(A)|A)<A| (2) 

with A the eigenvalues and |A) the eigenvectors of h(—Ro). (Whenever no argument of h is given, h(—Ro) is implied.) 
The massive overlap Dirac operator is 

777 

D(m) = (l-—)D(0) + m, (3) 
and the square of the massive Hermitian overlap operator H(m) can then be written as 

Til 

%f = (l- ¥ )i?(0) 2 + m 2 (4) 



with 



H 2 (0) = R 2 Q [2 + l5 e(h(-R )) + e(h(-R Q )) l5 ] . (5) 



Since [H 2 (m), 75] = 0, H 2 can be written as a sum of two operators each of which acts only on one chiral sector, 
H 2 = H\ + PT 2 .. In a sector with topological charge Q, H 2 (Q) has, assuming the correctness of the vanishing theorem, 
\Q\ zero modes of chirality a — sign(Q) and |Q| eigenmodes with eigenvalue 4Pq in the opposite chirality. The spectra 
of H + (0) 2 and P_(0) 2 outside of eigenvalues at A = 0,4Pq are identical. Changing the topological sector therefore 
amounts to transforming a pair of eigenvalues with opposite chirality and eigenvalues A to eigenvalues and 4Pq, i.e. 
a non-continuous change of the spectrum. The rest of the spectrum will also change significantly. This will become 
important in the discussion of the rate at which the topological sector changes. 

We will construct the sign function e(h(—Ro)) by projecting out the lowest n - lg eigenmodes |A) of the kernel operator 
(with respect to modulus) and using a Zolotarev approximation of order n z for the rest of the spectrum [TtI , 

e(h) = J2 sign(A)P A + £ £ Pa), (6) 

with Pa = |^)(A| the projector on the eigenstate |A). We fix n z , hi and Ci such that the precision of the sign function 
between at least A„ eig and the maximal retained eigenvalue of the kernel is better than some given value, e.g. 10~ 7 . 

The action of the sign function on a vector </> is computed by using a multi-mass (l9| conjugate gradient (CG) to 
invert h 2 + Cj, after projecting out the eigenmodes of the kernel operator from the vector. We will refer to this as the 
inner CG as opposed to the outer CG used to invert H 2 (m). We calculate the eigenmodes by the conjugate gradient 
method of Ref. poj . We choose an accuracy for the step function and monitor the norm of the vector e(h)(f> during 
the CG iteration, stopping when either the the observed accuracy of the step function is achieved, or when the inner 
CG converges. (Of course, we wish to set the accuracy of the inner CG high enough, that the second result never 
happens.) 

As proposed in Ref. [2l| we adjust the accuracy of the inner CG, r- m , depending on the current residue of the outer 
CG r out according to 

r in = min(ro-^-,r cut ) (7) 

Tout 
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with b the source vector of the inversion, ro the target residue and r cut a maximal inner residue. Since we want to 
make sure that this does not introduce an additional inaccuracy we restart the CG after convergence with the inner 
residue fixed to ro- With a good choice for r cut fewer inner CG steps will be needed in the first pass and the restarted 
CG will converge in 1-2 steps. 

B. The kernel operator 

The choice of the kernel Dirac operator used in the construction of the overlap provides an opportunity of optimiza- 
tion. Most groups use the Wilson operator because of the low computational effort to apply it on a vector. One might 
also choose an operator which almost fulfills the Ginsparg- Wilson relation like the hyper-cubic actions proposed in 
Refs. jUI^I and tested in [23 |. This has the advantage that the number of operations to 'transform' this operator 
into the overlap operator is low, e.g. it takes a small number of inner CG steps. However, the application of the kernel 
operator alone is much more expensive than for the Wilson action, involves more communication, and the fermionic 
force is cumbersome to code. We wish to find an optimum in between these two approaches. The application of 
the operator should be as cheap as possible with respect to computation and communication. But is also should 
approximate the Ginsparg- Wilson circle in some way. 

The optimization has two parts: the choice of a free field action and the choice of a gauge connection. 

We use an action similar to the one studied in Ref. [10( which has nearest and diagonal neighbor interactions. To 
be precise let us parameterize the associated massless free action by 

S = y~] $( x ) M 7 ") + *7/*/>/i( r )] + r ) ( 8 ) 

with r connecting nearest neighbors (r — ±fi; r\ — r)\, p^ = p\) and diagonal neighbors (r = ±/t ± 0, v 7^ fi; rj = 772, 
Pn = Pv = p2- The constraint r\{r = 0) = 770 = — 8771 — 24t7 2 enforces masslessness on the spectrum, and —1 = 2pi + 12p 2 
normalizes the action to —ipij^d^ in the naive continuum limit. Thus there are three free parameters to choose. 
These three parameters can be reduced to one by requiring that each of the couplings of a fermion to its neighbors is 
a projector, proportional to 1 ± h ■ 7. This is a familiar trick for Wilson action simulations. There, one gains almost a 
factor of two in speed from the trick because the multiplication of the gauge connection times the spinor only needs 
to be done on two, not four Dirac components, and because the "gather" of a spinor on a neighboring site does not 
need all four Dirac components, only the linear combination of two components which participates in the projection. 
For nearest neighbors, a projector action corresponds to the constraint 771 = p\ (up to signs) and for the diagonal 
neighbors, 772 = V%P2- With this action, the gain in speed on a single-node computer is only about 8-10 per cent, 
since the diagonal projection itself has an overhead. Of course, on a parallel machine, projection halves the necessary 
internodal communication. 

The action we use in the simulations presented in this paper uses p\ — —1/6 and pi = —1/18. We also add a clover 
term with the tree-level clover coefficient appropriate to this action of 1.278. For these parameters it seems optimal to 
set the radius of the Ginsparg- Wilson circle Ro to 1.2 since it is closest to the behavior of the kernel for the low- lying 
eigenmodes. We tested this value by timing a quenched approximation calculation of eigenmodes, for various Rq 
values. This was done with Wilson action gauge fields, at coupling (3 = 5.9, or a lattice spacing of about 0.13 fm. One 
could vary Rq by ±0.2 without much effect. In Fig. ^ we show the free field spectrum of the kernel Dirac operator we 
use together with several Ginsparg- Wilson circles. Let us remark that contrary to the Wilson operator, which has a 
maximal eigenvalue at 16, this operator has it at about 4. This alone reduces the condition number k = A ma x/A m i n 
of the h 2 , which controls the convergence of the inner CG, by a factor of about 16. 

The other ingredient of our kernel action, the stout link as a gauge connection, is so important that it deserves a 
separate discussion below, in Sec. IIVI 

C. Hybrid Monte Carlo 

We consider simulations with two flavors of degenerate-mass fermions. In a completely standard approach, we add 
to the gauge action a set of momenta Tr^(n) (with total energy i ^2 n 7r M (n) 2 ). To include the fermions, we introduce 
their action using pseudo-fermions 

S f [U,tf,<j>] =tfH(m)- 2 <l> (9) 

where the pseudo-fermion fields <fi are a set of four-component spinors. At the beginning of each trajectory we perform a 
heat-bath initialization of the fermion action by casting a vector of four-component colored Gaussian random numbers 
R and initialize <p = HR. 
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FIG. 1: Eigenmode spectrum of the kernel action used in this work on a finite lattice, with superimposed circles (which show 
the spectrum of overlap actions). The five circles have radius 1.0, 1.2, 1.4, 1.6, 1.8. 



We want to study the effect of additional pseudo-fermion fields as advocated in Refs. [lj, [15J on the performance 
of the HMC algorithm. One therefore rewrites the fermion determinant of two degenerate fermions as 

Np ~ 1 R 2 ( \ 

dctH 2 (m) = d C tH 2 (m Np ) TT dot Tj2( Km \ (10) 

with mi = to and < m^+i with suitably chosen larger masses. Using pseudo-fermions <f>i to deal with the 
determinants, this leads as usual to the effective action 



s eS [u, <j>t,<t>i] = s g [u] + s f [u, <j>t, (ii) 



with S g [U] the gauge action and 



JV D -1 



S f [U, , = ^H- 2 {m Np )^ + £ d>+ H { ™ J+1 h 3 ■ ( V2 ! 

j= i \ J) 



To calculate the fermion force one has to know the variation of the fermion action. Since [H 2 ,j^] = one can 
split the fermionic action into two contributions of definite chirality. In the following we restrict ourselves to one 
pseudo-fermion field. The generalization to multiple fields is trivial. For a chiral source <j) with chirality a = ±1 we 
follow |8| and |5j and compute the simulation time derivative of the fermionic action 

±S f [U, </>+, 0] = 4> + ^H- 2 4> - -2{Rl - ^)a^ + -^-em (13) 
dr dr 4 dr 

with tp = H~ 2 cf). This is only valid as long as no eigenvalue of h(—Ro) is zero. Since an eigenvalue changing sign 
causes a step in the action, we get an additional contribution 

±S f [U,ct> + ,cj>}\ d . 3c = 5(X)ASflU]\ . (14) 

It is very difficult to approximate the derivative of the step function by the derivative of its Zolotarev approximation, 
because near topological boundaries eigenmodes of h will develop arbitrarily small eigenvalues. In order to compute 
the variation of the approximation to the sign function Eq. ©, we have to know the variation of the projectors. This 
can be taken into account by using first order perturbation theory |25j, 

SPx = - PxWiPx + PxSh+j-^il - P A ). (15) 
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The derivative of the approximation of the sign function used in Eq. (|13[l thus becomes 

^+±e(h)i> = V+(l - P^mO- -P)^ + Y,{- Xxhpx - Pt k ^ + <*Mhpx + Pthpx)} (16) 

A 

with P = P* ano - ^C 1 ) the Zolotarev approximation to the sign function. The \ an d p are defined as 

pa = 3^(1- W; pa = Pa^; X x = j^e(h){i - P)ip . (17) 

The derivative of the Zolotarev approximation can be simplified to 

The term in Eq. I|14|) comes from the derivative of the step function at A = where the height of the step in the 
fermionic action is AS/. For this term we use now the Feynman-Hellmann theorem 

A = x + h X (19) 

with x the zero-mode of the kernel operator h. This force acts only if the eigenvalue A of the kernel operator h(—Ro) 
changes sign. At these points the topology changes, because using Eqs. and 

Q = TrH m = sign(A) . (20) 

A 

To compute the height of this step in the action, we evolve the gauge fields in simulation time onto the A = surface. 
Then we compute Sf on both sides of the surface, where we just change the sign of the value of the crossing eigenmode. 
This is the only source of discontinuity. Anything else which might change across this surface does so continuously 
and is thus captured by the equations of motion. Then we use the prescription of Ref. to alter the momentum tt 
depending on whether the step in potential AS/ is high enough to reflect at the surface or whether one can refract 
into the other topological sector. The new momenta then are 



An = i -N (N\n) + N sign(iV|7r) ^(A^vr) 2 - 2AS/ if (N\n) 2 > 2AS> 

' -2N(N\ir) if (A^lvr) 2 < 2AS> "" 



where AS*/ is the height of the discontinuity, N the vector normal to the surface of zero eigenvalue and tt the molecular 
dynamics momenta. The scalar product is defined by 

(7V k )=£Tr(iV+^. (22) 

X 

Using Eq. (|19|l , the normal vector N is computed using the derivative of the kernel action with respect to U 



Sh 



(23) 
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with \x) the zero- mode of the kernel operator and we take the traceless anti-Hermitian part so that n itself stays 
traceless and anti-Hermitian. 

A word on the evaluation of p\. Since A is an eigenvalue of h, the matrix 1/(A — h) is singular. Although 
1/(A — h) (1 — P\) is well defined, in a numerical calculation A and | A) may not be sufficiently accurately known for the 
inversion to be stable. In our simulation we therefore compute 1/(A + Si — h)(l — P\)ip for several 8i and interpolate 
to S = afterward. 

In our simulation the kernel operator is constructed from stout links. Thus, after calculating the variation of the 
kernel operator Sh with respect to the stout links one has to apply the chain rule as described in Ref. [9( to get the 
variation with respect to the unsmeared link variable U(x,fi). 

In order to maintain a high acceptance rate, it is crucial to monitor whether a mode has crossed the zero eigenvalue 
surface. Fortunately, functions of the gauge field variables, including eigenmodes of the kernel, evolve slowly in 
simulation time. The identification of the modes before and after a molecular dynamics time step is done by computing 
their scalar products. If the product of the vector at the beginning of the step with one at the end has an absolute 
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value close to one, the two modes are identified. A technical problem occurs if the modes become degenerate during 
the time step since the wave functions can mix in this case. However, enough similarity remains to identify the pair 
of the two modes which have mixed. If we have identified that at least the eigenvalue of one mode has changed sign 
we use the van Wijngaarden-Dekker-Brent method |26| to find the first point in simulation time where an eigenvalue 
vanishes. 

To integrate the equations of motion we employ a Sexton- Weingarten scheme setting the time-scale of the gauge 
updates to 1/nsw of the scale for the fermionic updates St. Using the following notation for the updates 

T PG (r) : 7r -> 7r - iTSuS g [U] 

T PF (r) :tt -> tt - WSuSf [U] (24) 
Tu(t) : U -> e iT7V U 

the updating algorithm is composed of the following elementary steps: 

1) Update momenta with fermion force: 

tt -> T pf (St/2)tt (25) 

2) Update gauge fields: Do nsw steps of the gauge update: 

[T PG (ST/(2n sw ))T v (ST/n sw )T PG (ST/(2n sw ))] nsw (26) 

3) Calculate eigenmodes of the kernel operator, decide whether an eigenmode has changed sign. If this is the case 
go to the time when this has happened, change the momenta according to Eq. I|21|) and evolve until the end of 
the time step. 

4) Update momenta with fermion force. 

tt T pf (5t/2)tt (27) 



III. SIMULATION PARAMETERS 



We wanted to test the algorithm at parameters appropriate to simulations which could be used to do physics. 
Quenched overlap simulations become increasingly difficult as the gauge coupling is carried into the strong coupling 
regime, as the cost of the inner CG grows. We also fear encountering the region of the Golterman-Shamir vanishing 
mobility edge^|. Thus, the largest lattice spacing a which we feel we can tolerate is at about a — 1/(6T C ) or about 
0.2 fm at a nominal Nf — 2 deconfinement temperature of 150 MeV. For our first round of tests, we do not need 
an accurate determination of a. Accordingly, we equilibrated 6 4 volumes at parameter values near their values for 
deconfinement. The crossover from the confined phase will be very rounded, and the deconfinement phase will not 
be very physical, since Polyakov loops in all directions will order, but we will still have a rough idea of the lattice 
spacing. After the simulation parameters are optimized, we can go to bigger spatial volumes. 

We simulated using the Liischer-Weisz gauge action [2?} ■ We approximated the tadpole improved coefficients of 
Ref. [23. Explicitly the action reads 

S[U] = /?l J2 I Re Tr I 1 - U Pl\ + #» J2 l Re Tr t 1 - U rt] + 03 J2 l Re Tr t 1 - U P9\ ' ( 28 ) 



^2 = - [! + 0.4805 a] , = - % 0.03325 a, (29) 



^2 [1 + 0.4805a] , /3 3 = - -. 
with U p i the plaquette, U r t the 1x2 rectangle, and U pg the perimeter-6 "parallelogram" loop, and 



n vi/4 In ( iRe Tr(U pl )) 

u„ = { -Re Tr(U pl ) ) , a = ^ '- . (30) 

V3 x P 'J 3.06839 v ; 

After running on 6 4 lattices using about 200 trajectories we measured uq — 0.86 for several parameter sets, with 
little variation. We therefore decided to set uq = 0.86 for all our production runs. Even though this is only a rough 
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estimate and not a self consistent determination of uq, we expect this to capture most of the effect of the tadpole 
improvement. 

In nearly all runs we applied two levels of isotropic stout smearing and chose the smearing parameter p = 0.15, see 
Sec. IIVI This value was determined by maximizing the average value of the plaquette constructed from the fat links 
on a number of quenched configurations. This is the same strategy as used in Ref. p9j to determine the parameters 
of HYP smearing. We believe that any value in the range between 0.1 and 0.2 would work nearly as well. 

We performed runs at bare quark masses am q = 0.05, 0.1 at (3 values such that we ran at both sides of the (pseudo) 
phase transition. A plot of the plaquette vs. gauge coupling [3 is shown in Fig. [21 In order to illustrate the change in 
the Polyakov loop as a function of (3 we show a scatter plot at am q = 0.1 in Fig. EJ 
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FIG. 2: Plaquette vs gauge coupling (3 = W/g 2 from dynamical overlap simulations on 6 4 lattices with quark masses am = 0.05 
and 0.1. 
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FIG. 3: Scatter plot of the Polyakov loop with quark masses am — 0.1 on the 6 lattices for various values of (3. 



We set the precision of the step function to be 10 -7 and ran the CG for the fermionic action to a residue of 10 -6 . 
For the HMC algorithm we chose to set the number of gauge update steps nsw = 12 since this provided us with a 
gauge force of roughly the same size as the fermion force. Due to the high cost of the fermion inversions the gauge 
updates are still a negligible fraction of the cost the whole update. 
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FIG. 4: Number of applications of the kernel operator on a vector per step and pseudo-fermion field as a function of the gauge 
coupling. The runs at am q = 0.05 where done with 2 pseudo-fermions, the runs at am q = 0.1 with three. 

A trajectory of molecular dynamics evolution had a typical time step of St = 1/20 for am q = 0.05 and St = 1/15 
for am q = 0.1. At each step we computed n c ; g = 8 eigenmodes of the kernel operator. The cost of these eigenmodes 
is 5%— 10% overhead in the confined phase and rises to as large as 50% at high (3 where the inversion is much cheaper. 
One has to make sure that the gauge field moves between two computations of the eigenvalues only such that an 
eigenvalue which has changed sign is among the n modes before and after the elementary step. This along with 
the requirement that preferably not more than one eigenmode changes sign per elementary step are the limiting 
requirements to the step size. 

We ran simulations on our array of 31 old 800 Mhz P-III's and 12 new 3.2 Ghz P-IVE's, all in single-processor 
mode. At (3 = 7.35 and am q = 0.1 we collect approximately 40 trajectories per day on the newer machines. Fig. 0] 
gives the dependence on the gauge coupling of the number of applications for the kernel operator on a vector per 
elementary step and pseudo-fermion field. Our code is based on the MILC package |30j. with SSE routinesJU for all 
SU(3) matrix multiplication. 

We now describe explorations of various parameter choices. 

First, there are some small tricks. We always begin the outer CG for the fermionic force with trial vectors taken 
by interpolating solutions from the previous two time steps. We also extrapolate the identified eigenvectors of h to 
begin the Conjugate Gradient calculation of new eigenmodes. Careful tuning of the r cu t parameter (recall Eq. d can 
reduce the number of inner CG steps (this from a (3 = 7.2, m = 0.05, St = 1/20 run) from about 35 to about 20. 

Fig. 03 illustrates the effect of Sexton- Weingarten updating. We plot the average over e AE (AE being the energy 
violation after one molecular dynamics trajectory of length 1) which governs the acceptance probability. It is from a 
run with (3 = 7.4 and am q = 0.1 with time-step St = 1/15. We started from the same equilibrium configuration and 
average over 19 trajectories. We see a considerable reduction in energy non-conservation as nsw increases. In this 



run, we can compare the fermion force per direction yJ2 x Tri^(a;)F A1 (a;) to the gauge force: the gauge force is about 
an order of magnitude larger. So setting nsw — 12 reduces the gauge impulse to roughly the level of the fermion 
impulse. 

This factor of 10 falls to about a factor of five in thin link simulations: the fermion force doubles. This probably 
happens because thin link fcrmions do not decouple from high-momentum gluon modes. A glance at a time history 
of AE shows Sexton- Weingarten updating works less well; one would simply have to lower the molecular dynamics 
time step St to reduce energy violation. 

We typically run with either N p — 2 or 3 pseudo-fermion fields. As we are going to discuss in Sec. E] this increases 
the number of changes of topology significantly whereas it has no impact on the step size. In order to check the 
implementation of the additional pseudo-fermions we compared the plaquette for N p = 1, 2 and 3 fields. They agreed 
within statistical errors. 



It has been long known that fat links can significantly decrease the computational cost of a pply ing the overlap 
operator on a vector [ll|. They also have a beneficial effect on the locality of the overlap operator |32| . Recently stout 
smearing 9j, which is analytic in the gauge fields and thus suitable for the HMC algorithm, has become available. 




IV. IMPACT OF FAT LINKS 
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FIG. 5: The average acceptance probability e AE with AE the violation in the microcanonical energy as a function of the number 
of Sexton- Weingarten steps nsw- We average over 19 trajectories which started from the same equilibrium configuration at 
j3 — 7.4 and am q — 0.1. The step-size is St = 1/15 and the trajectory length is 1 in all cases. The average is always consistent 
with one as required by reversibility. 
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TABLE I: We give relevant quantities for the performance of the HMC algorithm for different levels of stout smearing. The 
data is take at /3 = 7.2 and am q — 0.05. We list the number of stout steps, the acceptance rate, the number of trajectories and 
the number of reflections and refractions during these trajectories. The last column gives the average number of applications 
of the kernel operator on a vector per trajectory. 
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The stout link U for one level of isotropic smearing is constructed from the gauge links by 



.(*) 



TH 



(31) 



with Va[x) the sum over all staples associated with the link and [• • - ]th the traceless Hermitian part of the matrix. 
The contribution of this smearing to the fermion force has been worked out in Ref . Q . 

A main effect of the smearing is to remove the majority of the eigenvalues of h(—Ro) close to 0. This has two 
effects. First, it implies that one has to compute fewer eigenvectors of the kernel operator to capture the ones within 
a certain distance to the origin. For the same number of inner eigenmodes the condition number of h 2 is therefore 
smaller for the fat links than for thin links. This speeds up the inversion of the fermion Dirac operator. Furthermore, 
the computation of the fermion force is more stable since it is mainly plagued by small eigenmodes which are not 
well separated. This also implies a lower probability of two eigenmodes trying to change sign in the same HMC step, 
a problematic part of this algorithm. It furthermore might mean that the topology changes more rarely measured 
in HMC time and therefore lead to a higher auto-correlation time. However, it is not clear whether each of these 
attempts has the same probability of crossing over to the other topological sector or whether there is a correlation 
between consecutive attempts. This can only be clarified by an actual simulation. 

To get an idea of the effect of this smearing, we started runs from an equilibrium configuration at = 7.2 and mass 
am q = 0.05 with St — 1/20. We ran 16 trajectories with thin links, 101 with one level of stout smearing at p — 0.15 
and 591 with two levels of smearing. The plaquette is (TrU p ) w 1.67 and the plaquette made of stout links is 2.5 for 
one level of stout links and 2.8 for two levels. In each of the runs we used two pseudo-fcrmion fields. The results of 
these runs are summarized in Tabled We find that each level of smearing reduces the cost per trajectory by roughly a 
factor of 3. The acceptance rate is the same for thin links and one level of smearing, however, it improves significantly 
for two levels. This is probably a result of the lower density of eigenmodes of the kernel action. The number of 
reflections and refractions per trajectory decreases with more levels of smearing. It is more than compensated for by 
the higher speed of the simulation. 

Ref. ^3 demonstrates that insufficient localization of the low- lying modes of the kernel operator h(—Ro) might 
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FIG. 6: The inverse participation ratio as a function of the modulus of the eigenvalue of h(—Ro). The data is from the 8 
lowest eigenmodes only and is binned in bins of size 0.1. It is taken from a run at /3 = 7.2, am = 0.05. The modes of the 
Dirac operator constructed from thin links {n smear = 0) are only weakly localized (small IPR). Each smearing step increases 
the localization of the modes. The spectrum of h(—Ro) is denser near for fewer smearing steps. Without smearing, we do 
not observe eigenvalues larger than 0.18 among the 8 lowest modes. 



have implications on the locality of the overlap operator. We can get a rough measurement of the localization of the 
modes through the inverse participation ratio (V is the lattice volume) 

7 = F5> + (z)V^)] 2 - (32) 

X 

If a mode is localized on a fraction / of the lattice volume, we expect I — 1//. Fig. shows a comparison of / from 
a set of simulations at fj — 7.2, am = 0.05, with either one or two levels of p = 0.15 stout smearing, or simulations 
with a thin link. The inverse participation ratio (IPR) for the lowest eight modes has been binned and averaged. We 
immediately see that eigenmodes of h(—Ro) become progressively more localized with increasing smearing. The thin 
link kernel eigenmodes are large on about ten per cent of the lattice volume. This falls precipitously with smearing, 
to about two per cent for two smearing steps. We conclude that simulations with thin links at this set of parameter 
values are not only expensive, they are dangerously close to derealization. (We are well aware, that we are comparing 
actions with different bare parameters in the fermion sector.) Since the lattice spacing for different fermion actions 
(due to different levels of smearing) might be different, we also computed the eigenmodes of the thin link h(—Ro) 
operator on the configurations from the fat link data set. We found essentially no difference to the IPR from the thin 
link data set. 

We also notice that the range of eigenvalue of the lowest eight modes is considerably compressed as smearing is 
removed. This means that the range of the Zolotarev approximation of the step function must be increased, with a 
consequent increase in its evaluation cost due to the increased ill-conditioning of h 2 + Cj. 



V. CHANGING THE TOPOLOGICAL SECTOR 



When running with only one pseudo-fermion field we found it very difficult to change topological sector, i.e. the 
algorithm reflected most of the time at the boundary. We observed that the height of the step was typically O(200) 
compared to \(N, tt)\ 2 which is 0(1). This is the case independent of the direction of the change in topology, e.g. 
whether one changes from Q = to Q = 1 or vice versa. It only occurs rarely that AS* is such that a change in 
topology is possible and one gets long sequences without a change in topology indicating a high auto-correlation time. 

Why is this so? The <j> field in the HMC algorithm with one pseudo-fermion field is generated with the distribution 
exp(—(f> + H~ 2 <fi). This is done by generating a Gaussian random vector R and then multiplying it by H . The expression 
which gives the contribution of the fermionic determinant during the evolution is thus given by exp{-\H^H R\ 2 ) with 
Hq the Hermitian Dirac operator at the beginning of the trajectory and H\ the 'current' Hermitian Dirac operator. In 
Ref. |3^ | it was found that this is a good estimator for the change in the fermionic determinant only if the eigenvalues 
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FIG. 7: Low-lying spectrum of H(0) 2 for a set of /3 = 7.3, m 
Octagons and crosses show the spectrum on either side. 



0.1 lattices on either side of a topology-changing boundary. 



of Hq and H\ are very similar. This is likely the case as long as one stays in the same topological sector. However, 
the eigenvalue spectrum changes significantly when the topological sector changes, e.g. two eigenvalues of H, ±A, are 
transformed to {to, 2Rq} or vice versa. To be precise, the change in the determinants is given by 



det H+H 1 
dct H+H 



dRdW 



(33) 



with f2 = HqH~ Hq — 1. The exponential function is bounded by zero from below. So large fluctuations in 
f[R] = exp(— R + ttR) mean lots of events with f[R] ss and only very few ones with f[R] large. For R + ilR this 
means a large number of (very) large values and only very few small ones. Small fluctuations, however, mean that 
R + QR has values in a small interval around some constant given by the determinant ratio. 

Thus, while H\ is in the same chiral sector as Hq, the change in the fermionic action is small. However, when 
we cross to the other chiral sector, the large fluctuations in the estimator are most likely such that AS 1 / is a large 
number, which means that the trajectory will reflect from the boundary and stay in the same topological sector. It 
is therefore pivotal to employ a better estimator for the change in the determinant. 

As an illustration of the change in the spectrum, we have computed a set of low-lying eigenmodes of H(0) 2 on 
either side of a topology change. Specifically, we move onto the critical surface and compute the spectrum of H(0) 2 
twice, once with the eigenvalue of the lowest eigenmode set to its value just before the crossing, and then a second 
time with its eigenvalue's sign flipped. The parameters are (3 — 7.3, m = 0.1. Fig. [7|shows that not only does a zero 
mode appear or disappear, but all the low lying eigenvalues of H(0) 2 change discontinuously. This is to be expected: 
eigenmodes of Hcrmitian matrices repel, and the appearance of a zero mode (for example) must push nearby modes 
away. 

To reduce fluctuations, we decided to use the method proposed in Refs. [Til Il5|. which consists of rewriting the 
fermion determinant as in Eq. I|10|) . In this method, only determinant ratios are evaluated using pseudo-fermions for 
the light quark masses. The change in the spectra while changing topological sector of the ratio H{m)/ H(m!) can be 
expected to be less dramatic than the change of the spectrum of H{m). Only the determinant of Him^) is evaluated 
directly. However, for a large mass tojv the spectrum of H 2 is confined to a smaller region between m 2 N and AR 2 and 
the change in the spectrum therefore less drastic than for a smaller mass. In order to distribute the contribution to 
the fermionic action equally among the different pseudo- fermion fields we choose the (N — 1) auxiliary masses for a 
sea quark mass of TOi to be 



TO, 



(N-i+l)/N 



(34) 



In Fig.|S]we show the distribution of 2A5/ and \(N, tt)\ 2 . The topological sector changes if 2A5/ < \{N, ir)\ 2 . The 
data is taken from a run at j3 = 7.2 and am q — 0.05 and St — 1/20. We see that for the standard HMC with one 
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No. of PF 


trajectories 


attempts/traj. 


refr. prob. 


MatVec / traj. 


acc. rate 


1 


602 


1.3(1) 


0.5(3)% 


2.3(1) • 10 5 


90% 


2 


591 


1.2(1) 


2.0(6)% 


3.4(1) • 10 5 


93% 


3 


290 


1.6(2) 


3(1)% 


5.5(4) ■ 10 5 


89% 



TABLE II: The effect of additional pseudo-fermion fields. We list the the number of trajectories we analyzed, the attempts to 
change topology (reflections plus refractions) per trajectory, the percentage of refractions and the number of applications of 
h(—Ro) on a vector per trajectory. In the last column we give the acceptance rate. The data is taken at /3 = 7.2 and am — 0.05. 
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FIG. 8: Distribution of the two components which determine whether a trajectory reflects or refracts at the boundary between 
two topological sectors. Refraction takes place if 2AS < \{N,H}\ 2 . The data is taken at j3 — 7.2 and am q = 0.05. For the 1 
pseudo-fermion distribution of 2AS we do not show about 5% of the data which is above 3000. 



pseudo-fermion field the distribution of 2ASf is spread out up to values of several thousand whereas the distribution 
of \(N, 7r)| 2 only spreads up to w 10. (The distribution of \(N,ir)\ 2 is independent of the number of pseudo-fermion 
fields.) The spread of 2A5/ shrinks considerably with more pseudo-fermion fields. 

In these runs we observed roughly 1.5 attempts per trajectory to cross over to another topological sector independent 
of the number of pseudo-fermion fields. (For a summary of the results see Table lli*|) The additional fields have a 
significant impact on the rate of topology changes. Whereas a change only occurs once in roughly 200 trajectories for 
one pseudo-fermion field, it occurs once per 35 trajectories for three pseudo-fermions. However, the additional fields 
seem to have no impact on the acceptance rate. The additional cost of the auxiliary fields is thus only justified by 
the higher tunneling rate and does not pay back in a possibility to decrease the step size as in the case of simulations 
with Wilson fcrmions. It appears from Table UTI that a single additional set of pseudofermions is sufficient. 

In Fig. El we show the time history of the topological charge as a function of simulation time for three values of j3 
and am q =0.1. The runs were done with three pseudo-fermion fields. Below the phase transition we observe a lot 
of changes in topology and \Q\ up to 2. This is as expected since chiral symmetry is expected to be broken and the 
lattice spacing is large. With larger the lattice spacing gets smaller, the temperature gets larger and we therefore 
expect fewer instantons and a smaller topological charge. This is confirmed by our simulations. 



VI. CONCLUSIONS 



Simulations with two flavors of light dynamical overlap fermions with lattice spacing near a ~ 1/(6T C ) and am q = 
0.05 (m ~ 50 MeV?) and small volume (a few fm 4 ) are certainly feasible on arrays of contemporary (2004) work 
stations. The crucial ingredient needed to do such simulations is a fermion action with a fat link gauge connection. 

To be definite, we summarize our action: The gauge action is given by Eq. (|28|l with uq = 0.86. The kernel action 
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FIG. 9: The history of the topological charge in simulation time for am q = 0.1 at three values of 0. f3 — 7.25 is probably below 
the phase transition whereas f3 = 7.55 is above. The runs where done with three pseudo-fermion fields. 

is that of Eq.lSjwith p\ = —1/6, p2 = —1/18, a clover term with coefficient Csw = 1-278, and the gauge connection is 
a two-level stout link with smearing parameter p = 0.15. We use it in the overlap with (negative) mass shift Rq = 1.2. 

We are quite aware, that we have not completely characterized how much more efficient fat link overlap actions 
are, than the conventional thin link ones. Table |U suggests a factor of 10 at equal 6t and a (probably a factor of 2) 
smaller St is needed for equivalent acceptance rate. Suffice it to say, thin link overlap is so expensive that wc could 
not do anything useful with it with our available resources. Observations such as Fig. suggest that thin link overlap 
might require a smaller lattice spacing than a ~ 1/(6T C ) to avoid delocalized kernel eigenmodes. 

The use of additional pseudo-fermion fields as suggested by Hasenbusch is very beneficial to the rate at which the 
topological sector changes. However, at our parameter values it seems to have no impact on the HMC acceptance 
rate. 

Conventional large-lattice simulations are of course still not possible with these actions with small computer re- 
sources, but there are physics issues whose solution is much more sensitive to exact chiral symmetry than to large 
volume. We hope to address them in the near future. 
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